<a href = "other_page.html "> some_text </a>
The text in red
are the keywords (or key
characters) which are not case sensitive. There may or may not be spaces
before and after the "="
key character.
other_page.html is
the name of the page linked to.
some_text is
some text string
The "linked to" pages may have links to other pages and may even link to the previous page. Let us see an example:
A page index.html
has links to
1. depatment.html
2. students.html
3. faculty.html
department.html has
links to
1.
cse.html
2.
eee.html
3.
me.html
4.
civil.html
5.
index.html
students.html
has links to
1.
organizations.html
2.
facilities.html
3.
groups.html
cse.html
has links to
1.
intro.html
2.
location.html
3.
alumni.html
4.
index.html
Tree view:
index (level 0
page)
|
|_______department
(level 1 page)
|
|_______cse
(level 2 page)
|
|
|_______intro
(level 3 page)
|
|
|_______location
|
|
|_______alumni
|
|
|_______index
|
|
|
|_______eee
|
|_______me
|
|_______civil
|
|_______index
|
|_______students
|
|_______organizations
|
|_______facilities
|
|_______groups
|
|_______faculty
Your job is to parse HTML documents and retrieve the links from there. If above HTML files are given as input, your output should be
Level 0 page:
index.html has links to
1. depatment.html
2. students.html
3. faculty.html
Level 1 page: department.html has links to
1. cse.html
2. eee.html
3. me.html
4. civil.html
5. index.html
Level 1 page:
students.html has links to
1. organizations.html
2. facilities.html
3. groups.html
Level 2 page:
cse.html has links to
1. intro.html
2. location.html
3. alumni.html
4. index.html
Input:
a. Name of
the Level 0 page
b. Maximum no
of levels to follow
Output:
as indicated above
Note: There is no limit to the total number of links.
Assumptions:
a. All pages
will be placed in the same directory/folder
b. Page name
is case insensitive
c. There may
be tags in the HTML file other than link tag. You should
ignore those tags. In fact, you should ignore anything other than
the link tag
d. The links
pointing to a lower level page should not be
followed.
(otherwise your program will fall into an infinite loop).
Hints:
Use a Queue to hold the name of the pages. Each entry in the queue may contain
the following information:
1. name of the
page
2. Level of the page
Queue Implementation
= 5
Parsing of link tags + file operation = 5
Showing output in desired order = 5
Avoidance of infinite recursion = 5
Overall program efficiency = 5
Total marks = 25